Variation in noun and pronoun frequencies in a sociohistorical corpus of English

نویسندگان

  • Tanja Säily
  • Terttu Nevalainen
  • Harri Siirtola
چکیده

Many corpus linguists make the tacit assumption that part-of-speech frequencies remain constant during the period of observation. In this article, we will consider two related issues: (1) the reliability of part-of-speech tagging in a diachronic corpus, and (2) shifts in tag ratios over time. The purpose is both to serve the users of the corpus by making them aware of potential problems, and to obtain linguistically interesting results. We use noun and pronoun ratios as diagnostics indicative of opposing stylistic tendencies, but we are also interested in testing whether any observed variation in the ratios could be accounted for in sociolinguistic terms. The material for our study is provided by the Parsed Corpus of Early English Correspondence (PCEEC), which consists of 2.2 million running words covering the period 1415–1681. The part-of-speech tagging of the PCEEC has its problems, which we test by reannotating the corpus according to our own principles and comparing the two annotations. While there are quite a few changes, the mean percentage of change is very small for both nouns and pronouns. As for variation over time, the mean frequency of nouns declines somewhat, while the mean frequency of pronouns fluctuates with no clear diachronic trend. However, women consistently use more pronouns than men, while men use more nouns than women. More fine-grained distinctions are needed to uncover further regularities and possible reasons for this variation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Disambiguation preferences and corpus frequencies in noun phrase conjunction

Gibson and Sch€ utze (1999) showed that on-line disambiguation preferences do not always mirror corpus frequencies. When presented with a syntactic ambiguity involving the conjunction of a noun phrase to three possible attachment sites, participants were faster to read attachments to the first site than attachments to the second one, although the latter were shown to be more frequent in text co...

متن کامل

Distributional Identification of Non-Referential Pronouns

We present an automatic approach to determining whether a pronoun in text refers to a preceding noun phrase or is instead nonreferential. We extract the surrounding textual context of the pronoun and gather, from a large corpus, the distribution of words that occur within that context. We learn to reliably classify these distributions as representing either referential or non-referential pronou...

متن کامل

Acquisition of cleft structures in L1 and L2

The  present study aims at exploring the processing difficulty of cleft structures as a type of relative clause for EFL and Persian as  first language learners.The impact of head nouns with various functions as well as that of embedding on the processing of Persian and English cleft structures has been investigated in the present study.The participants  were 68  Iranian male and female students...

متن کامل

A Comparative Analysis of Institutional Identities in a Corpus of English and Persian News Interviews

Institutional identity as a concept in CDA is a field of study that deals with the identities that individuals in institutions obtain, one that merits deep research attention. News interviews as institutional instances can be analyzed based on the impersonal structures because interviewees see themselves as part of the institution and they may not take responsibility when they encounter problem...

متن کامل

Iranian EFL learners’ knowledge of overt pronoun constraint in English

The present study aims to investigated Iranian L2 speaker`s knowledge of the overt pronounconstraint (OPC) in English. It also aims to examine L2 learners understanding of UniversalGrammar (Chomsky 1981, 1986, 2000, 2001) and if it is common to all languages. Specifically,this study takes a new look at the L2 acquisition of knowledge of the overt pronoun constraint(Montalbetti 1984) by Persian ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • LLC

دوره 26  شماره 

صفحات  -

تاریخ انتشار 2011